annual income
Intersectional Bias in Japanese Large Language Models from a Contextualized Perspective
Yanaka, Hitomi, He, Xinqi, Lu, Jie, Han, Namgi, Oh, Sunjin, Kumon, Ryoma, Matsuoka, Yuma, Watabe, Katsuhiko, Itatsu, Yuko
An increasing number of studies have examined the social bias of rapidly developed large language models (LLMs). Although most of these studies have focused on bias occurring in a single social attribute, research in social science has shown that social bias often occurs in the form of intersectionality -- the constitutive and contextualized perspective on bias aroused by social attributes. In this study, we construct the Japanese benchmark inter-JBBQ, designed to evaluate the intersectional bias in LLMs on the question-answering setting. Using inter-JBBQ to analyze GPT-4o and Swallow, we find that biased output varies according to its contexts even with the equal combination of social attributes.
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
Gu, Zhuojun, Wang, Quan, Han, Shuchu
Recent advances in Large Language Models (LLMs) highlight the need to align their behaviors with human values. A critical, yet understudied, issue is the potential divergence between an LLM's stated preferences (its reported alignment with general principles) and its revealed preferences (inferred from decisions in contextualized scenarios). Such deviations raise fundamental concerns for the interpretability, trustworthiness, reasoning transparency, and ethical deployment of LLMs, particularly in high-stakes applications. This work formally defines and proposes a method to measure this preference deviation. We investigate how LLMs may activate different guiding principles in specific contexts, leading to choices that diverge from previously stated general principles. Our approach involves crafting a rich dataset of well-designed prompts as a series of forced binary choices and presenting them to LLMs. We compare LLM responses to general principle prompts stated preference with LLM responses to contextualized prompts revealed preference, using metrics like KL divergence to quantify the deviation. We repeat the analysis across different categories of preferences and on four mainstream LLMs and find that a minor change in prompt format can often pivot the preferred choice regardless of the preference categories and LLMs in the test. This prevalent phenomenon highlights the lack of understanding and control of the LLM decision-making competence. Our study will be crucial for integrating LLMs into services, especially those that interact directly with humans, where morality, fairness, and social responsibilities are crucial dimensions. Furthermore, identifying or being aware of such deviation will be critically important as LLMs are increasingly envisioned for autonomous agentic tasks where continuous human evaluation of all LLMs' intermediary decision-making steps is impossible.
Reassessing Evaluation Functions in Algorithmic Recourse: An Empirical Study from a Human-Centered Perspective
Tominaga, Tomu, Yamashita, Naomi, Kurashima, Takeshi
In this study, we critically examine the foundational premise of algorithmic recourse - a process of generating counterfactual action plans (i.e., recourses) assisting individuals to reverse adverse decisions made by AI systems. The assumption underlying algorithmic recourse is that individuals accept and act on recourses that minimize the gap between their current and desired states. This assumption, however, remains empirically unverified. To address this issue, we conducted a user study with 362 participants and assessed whether minimizing the distance function, a metric of the gap between the current and desired states, indeed prompts them to accept and act upon suggested recourses. Our findings reveal a nuanced landscape: participants' acceptance of recourses did not correlate with the recourse distance. Moreover, participants' willingness to act upon recourses peaked at the minimal recourse distance but was otherwise constant. These findings cast doubt on the prevailing assumption of algorithmic recourse research and signal the need to rethink the evaluation functions to pave the way for human-centered recourse generation.
SHAP: Explain Any Machine Learning Model in Python
This article is part of a series where we walk step by step in solving fintech problems with Machine Learning using "All lending club loan data". In previous articles, we prepared a dataset and built a Logistic Regression model, and we discussed the most common "ML model evaluation metrics" for a classification problem in the fintech space. This article will try to "understand" how our model decision works and what packages can help us to answer this question. Machine learning models are frequently named "black boxes". They produce highly accurate predictions.
The State Of Data, April 2021
Data is eating the world and there are numerous indicators of its ubiquitous presence in our lives and how it makes businesses and consumers both anxious and animated. Data dominates our deeds, debates, and dreams. "Covid has only accelerated the digital transformation, and automation is the cornerstone of digital transformation services"--Daniel Dines, co-founder and CEO, UiPath, Robotic Process Automation (RPA) startup whose revenues increased 81% in 2020 and its April 20, 2021, IPO, valued it at $36 billion "โฆthe whole reason [AI] takes so long in the first place is that it's not easy"-- Erik Brynjolfsson, director of the Stanford Digital Economy Lab "When the [NFT] bubble bursts, it's not going to wipe out this technology. It's just going to wipe out the junk"--Beeple (artist Mike Winkelmann whose NFT-certified digital mosaic piece sold for $69 million) "Data is now at the center of global tradeโฆ Digital technologies trafficking in data now enable, and in some cases have replaced, traditional trade in goods and servicesโฆ The global economy has become a perpetual motion machine of data: it consumes it, processes it, and produces ever more quantities of it"--David H. McCormick and Matthew J. Slaughter "We've been talking about home robots coming for a long time, and all we have so far is the vacuum cleaner"--Jeff Burnstein, President, Association for Advancing Automation "As a supply-chain provider, as a logistics provider, we are very much in the data business"--Mario Harik, CIO, XPO Logistics "People are getting confused about the meaning of AI in discussions of technology trends--that there is some kind of intelligent thought in computers that is responsible for the progress and which is competing with humans. We don't have that, but people are talking as if we do"--Michael Jordan, University of California, Berkeley
Customer Segmentation Using K Means Clustering - KDnuggets
Customer Segmentation is the subdivision of a market into discrete customer groups that share similar characteristics. Customer Segmentation can be a powerful means to identify unsatisfied customer needs. Using the above data companies can then outperform the competition by developing uniquely appealing products and services. You are owing a supermarket mall and through membership cards, you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. You want to understand the customers like who are the target customers so that the sense can be given to marketing team and plan the strategy accordingly.
Customer Segmentation Using K Means Clustering - WebSystemer.no
I started with loading all the libraries and dependencies. The columns in the dataset are customer id, gender, age, income and spending score. I dropped the id column as that does not seem relevant to the context. Also I plotted the age frequency of customers. Next I made a box plot of spending score and annual income to better visualize the distribution range. The range of spending score is clearly more than the annual income range.
Machine Learning For Grannies
She just finished over-feeding you with her delicious food, and now wants you to fix her Skype account. "It's not working" she complains. Turns out she somehow managed to get the trojan horse virus. You restore her computer, create a new Skype account and everything is fine. "Tell me what you're doing in school!" she asks.
AI disruption: Is robo-advisory a threat to financial planners?
The fintech industry has been revolutionizing the sector in the past few years, with disruptive technologies such as machine learning, artificial intelligence, social data intelligence and blockchain, which are proving to be enablers in solving critical business problems. According to fintech experts, robos are already doing 75 percent of the distributor's job. For instance, these automated advisers ask you your age and a few questions about risk, goals and your time horizon. Then the platform runs your choices through their algorithms. Next, you are assigned a risk level ranging from conservative to aggressive.
Business intuition in data science
Often when we think of a data science assignment, the main thing that comes to mind is the algorithm technique that needs to be applied. While, that is crucially important, there are many other steps in a typical data science assignment that requires equal attention. There is an online retailer, who is running a shopping festival in the month of November, just before the holiday season. It has a catalogue of a million products and a database of 100 million customers, who had bought from them in the past. The retailer wants to do promotional email campaigns to its customer base.